Scaling of the distribution of price fluctuations of 
individual companies 



On 
On 
On 



Vasiliki Plerou 1 ' 2 , Parameswaran Gopikrishnan 1 , Luis A. Nunes Amaral 1 , 
Martin Meyer 1 , and H. Eugene Stanley 1 
Center for Polymer Studies and Dept. of Physics, Boston University, Boston, MA 02215, USA 
2 Department of Physics, Boston College, Chestnut Hill, MA 02167, USA 
(Last modified: February 1, 2008. Printed: February 1, 2008) 



43 

o 



CO 

a 

S3 
O 

o 



o 

On 
On 

03 



13 
C 
O 

o 



X 
S3 



We present a phenomenological study of stock price fluctuations of individual companies. We 
systematically analyze two different databases covering securities from the three major US stock 
markets: (a) the New York Stock Exchange, (b) the American Stock Exchange, and (c) the National 
Association of Securities Dealers Automated Quotation stock market. Specifically, we consider (i) 
the trades and quotes database, for which we analyze 40 million records for 1000 US companies for 
the 2-year period 1994-95, and (ii) the Center for Research and Security Prices database, for which 
we analyze 35 million daily records for approximately 16,000 companies in the 35-year period 1962- 
96. We study the probability distribution of returns over varying time scales At, where At varies 
by a factor of w 10 — from 5min up to ~ 4 years. For time scales from 5 min up to approximately 
16 days, we find that the tails of the distributions can be well described by a power-law decay, 
characterized by an exponent a ~ 3 — well outside the stable Levy regime < a < 2. For time 
scales At 3> (At) x ~ 16 days, we observe results consistent with a slow convergence to Gaussian 
behavior. We also analyze the role of cross correlations between the returns of different companies 
and relate these correlations to the distribution of returns for market indices. 



I. INTRODUCTION 

The study of financial markets poses many challeng- 
ing questions. For example, how can one understand a 
strongly fluctuating system that is constantly driven by 
external information? And, how can one account for the 
role of the feedback between the markets and the outside 
world, or of the complex interactions between traders and 
assets? An advantage for the researcher trying to answer 
these questions is the availability of huge amounts of data 
for analysis. Indeed, the activities at financial markets 
result in several observables, such as the values of dif- 
ferent market indices, the prices of the different stocks, 
trading volumes, etc. 

Some of the most widely studied market observables 
are the values of market indices. Previous empiri- 
cal studies p]-|i"2fl show that the distribution of fluctu- 
ations — measured by the returns — of market indices 
has slow decaying tails and that the distributions appar- 
ently retain the same functional form for a range of time 
scales ]l],p|,pL[7| . Fluctuations in market indices reflect av- 
erage behavior of the price fluctuations of the companies 
comprising them. For example, the S&P 500 is defined as 
the sum of the market capitalizations (stock price multi- 
plied by the number of outstanding shares) of 500 com- 
panies representative of the US economy. 

Here, we focus on a more "microscopic" quantity: indi- 
vidual companies. We analyze the tic-by-tic data [jl3| for 
the 1000 publicly-traded US companies with the largest 
market capitalizations and systematically study the sta- 
tistical properties of their stock price fluctuations. A 
preliminary study |Q reported that the distribution of 
the 5 min returns for 1000 individual companies and the 



S&P 500 index decays as a power-law with an exponent 
a « 3 — well outside the stable Levy regime (a < 2). 
Earlier independent studies on individual stock returns 
on longer time scales yield similar results pl|. These 
findings raise the following questions: 

First, how does the nature of the distribution of indi- 
vidual stock returns change with increasing time scale 
At? In other words, does the distribution retain its 
power-law functional form for longer time scales, or 
does it converge to a Gaussian, as found for market in- 
dices @,|l^l? If the distribution indeed converges to Gaus- 
sian behavior, how fast does this convergence occur? For 
the S&P 500 index, for example, one finds the distribu- 
tion of returns to be consistent with a non-stable power- 
law functional form (a ~ 3) until approximately 4 days, 
after which an onset of convergence to Gaussian behavior 
is found [jl6| . 

Second, why is it that the distribution of returns for 
individual companies and for the S&P 500 index have the 
same asymptotic form? This finding is unexpected, since 
the returns of the S&P 500 are the weighted sums of the 
returns of 500 companies. Hence, we would expect the 
S&P 500 returns to be distributed approximately as a 
Gaussian, unless there were significant dependencies be- 
tween the returns of different companies which prevent 
the central limit theorem from applying. 

To answer the first question, we extend previous work 
]l4j on the distribution of returns for 5 min returns by 
performing empirical analysis of individual company re- 
turns for time scales up to 46 months. Our analysis uses 
two distinct data-bases detailed below. We find that the 
cumulative distribution of individual-company returns is 
consistent with a power-law asymptotic behavior with 
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exponent a ~ 3, which is outside the stable Levy regime. 
We also find that these distributions appear to retain the 
same functional form for time scales up to approximately 
16 days. For longer time scales, we observe results con- 
sistent with a slow convergence to Gaussian behavior. 

To answer the second question, we randomize each of 
the 500 time series of returns for the constituent 500 
stocks of the S&P 500 index. A surrogate "index re- 
turn" thus constructed from the randomized time series, 
shows fast convergence to Gaussian. Further, we find 
that the functional form of the distribution of returns re- 
mains unchanged for different system sizes (measured by 
the market capitalization) while the standard deviation 
decays as a power-law of market capitalization. 

The organization of this paper is as follows. Section II 
describes the databases studied and the data analyzed. 
Sections III, IV, and V present results for the distribu- 
tion of returns for individual companies for a wide range 
of time scales. Section VI discusses the role of cross- 
correlations between companies and possible reasons why 
market indices have statistical properties very similar to 
those of individual companies. Section VII contains some 
concluding remarks. 



II. THE DATA ANALYZED 

We analyze two different databases covering securities 
from the three major US stock markets, namely (i) the 
New York Stock Exchange (NYSE), (ii) the American 
Stock Exchange (AMEX), and (iii) the National Associ- 
ation of Securities Dealers Automated Quotation (Nas- 
daq) stock market. NYSE is the oldest stock exchange, 
tracing its origin to the Buttonwood Agreement of 1792 
p7[ . The NYSE is an agency auction market, that is, 
trading at the NYSE takes place by open bids and offers 
by Exchange members, acting as agents for institutions 
or individual investors. Buy and sell orders are brought 
to the trading floor, and prices are determined by the in- 
terplay of supply and demand. As of the end of November 
1998, the NYSE lists over 3,100 companies. These com- 
panies have over 2 x 10 11 shares, worth approximately 
USD 10 13 , available for trading on the Exchange. 

In contrast to NYSE, Nasdaq uses computers and 
telecommunications networks which create an electronic 
trading system wherein the market participants meet 
over the computer rather than face to face. Nasdaq's 
share volume reached 1.6 x 10 11 shares in 1997 and dol- 
lar volume reached USD 4.4 xlO 12 . As of December 1998, 
the Nasdaq Stock Market listed over 5,400 US and non- 
US companies Nasdaq and AMEX, have merged on 
October 1998, after the end of the period studied in this 
work. 

The first database we consider is the trades and quotes 
(TAQ) database |jl9), for which we analyze the 2-year 
period January 1994 to December 1995. The TAQ 
database, which is published by NYSE since 1993, cov- 



ers all trades at the three major US stock markets. This 
huge database is available in the form of CD-ROMs. The 
rate of publication was 1 CD-ROM per month for the 
period studied, but recently has increased to 2-3 CD- 
ROMs per month. The total number of transactions for 
the largest 1000 stocks is of the order of 10 9 in the 2-year 
period studied. 

The second database we analyze is the Center for Re- 
search and Security Prices (CRSP) database |2(J. The 
CRSP Stock Files cover common stocks listed on NYSE 
beginning in 1925, the AMEX beginning in 1962, and 
the Nasdaq Stock Market beginning in 1972. The files 
provide complete historical descriptive information and 
market data including comprehensive distribution infor- 
mation, high, low and closing prices, trading volumes, 
shares outstanding, and total returns |pl| . 

The CRSP Stock Files provide monthly data for NYSE 
beginning December 1925 and daily data beginning July 
1962. For the AMEX, both monthly and daily data be- 
gin in July 1962. For the Nasdaq Stock Market, both 
monthly and daily data begin in July 1972. 

We also analyze the S&P 500 index, which comprises 
500 companies chosen for market size, liquidity, and in- 
dustry group representation in the US. In our study, 
we analyze data with a recording frequency of less than 
1 min that cover the 13 years from January 1984 to De- 
cember 1996. The total number of data points in this 
13-year period exceeds 4.5 x 10 6 . 



III. THE DISTRIBUTION OF RETURNS FOR 

AT < 1 DAY 

The basic quantity studied for individual companies — 
i = 1,2, ... , 1000 — is the market capitalization Si(t), 
defined as the share price multiplied by the number of 
outstanding shares. The time t runs over the working 
hours of the stock exchange — removing nights, weekends 
and holidays f22|| . For each company, we analyze the re- 
turn 

Gi ee Gi{t,M) = In Si(t + At) -\nSi(t). (1) 

For small changes in Si(t), the return Gi{t 1 At) is approx- 
imately the forward relative change, 

For time scales shorter than 1 day, we analyze the data 
from the TAQ database. We consider the largest 1000 
companies |23| , in decreasing order of values of their 
market capitalization on the first trading day, 3 January 
1994. We sample the price of these 1000 companies at 
5 min intervals [ p4[ . In order to obtain time series for 
market capitalization, we multiply the stock price of each 
company by the number of outstanding shares for that 
company at each sampling time. We thereby generate 
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a time series, sampled at 5 min intervals, for the mar- 
ket capitalizations of each of the largest 1000 companies. 
Each of the 1000 time series has approximately 40,000 
data points — corresponding to the number of 5 min in- 
tervals in the 2-year period — or about 40 million data 
points in total. For each time series of market capitaliza- 
tions, we compute the 5 min returns using Eq. ([!]). We 
filter the data to remove spurious events, such as occur 
due to the inevitable recording errors p5[. 



A. The distribution of returns for At = 5 min 

Figure |l|(a) shows the cumulative distributions of re- 
turns Gi for At = 5 min — the probability of a return 
larger than or equal to a threshold — for 10 individual 
companies randomly selected from the 1000 companies 
that we analyze. For each company i, the asymptotic 
behavior of the functional form of the cumulative distri- 
bution is "visually" consistent with a power-law, 



Regression fits in the region 2 < g < 80 yield 



P <P< > x) ~ -L 



(3) 



where on is the exponent characterizing the power-law 
decay. In Fig. [|(b) we show the histogram for oti, ob- 
tained from power-law regression-fits to the positive tails 
of the individual cumulative distributions of all 1000 com- 
panies studied. The histogram has most probable value 
a M p = 3. 

Next, we compute the time-averaged volatility Vi = 
Vi (At) of company i as the standard deviation of the re- 
turns over the 2-year period 
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(Gi 2 ) T - (G t ) T 2 



(4) 



where {■ ■ -)t denotes a time average over the 40,000 data 
points of each time series, for the 2-year period studied. 
Figure [l](a) suggests that the widths of the individual dis- 
tributions differ for different companies; indeed, compa- 
nies with small values of market capitalization are likely 
to fluctuate more. In order to compare the returns of 
different companies with different volatilities, we define 
the normalized return <?; = gi(t, At) as 
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Gi — (Gi)r 



(5) 



Figure [j](c) shows the ten cumulative distributions of 
the normalized returns gi for the same ten companies 
as in Fig 0(a). The distributions for all 1000 normal- 
ized returns gi have similar functional forms to these 
ten. Hence, to obtain better statistics, we compute a 
single distribution of all the normalized returns. The cu- 
mulative distribution P(g > x) shows a power-law decay 
[Fig 1(a)], 
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P (9 > x) ~ - 



(6) 



3.10 ±0.03 
2.84 ±0.12 



(positive tail) 
(negative tail) 



These estimates 26 of the exponent a are well outside 
the stable Levy range, which requires < a < 2. 

In order to obtain an alternative estimate for a, we 
use the methods of Hill OJTil |il],|27ll . We first calcu- 



late the inverse of the local logarithmic slope of P(g), 
C _1 (<7) = d log P(g) / d log g , where g is rank-ordered. We 
then estimate the asymptotic slope a by extrapolating 
£ as a function of 1/g — > 0. Figure || shows the results 
for the negative and positive tails, for the 5 min returns 
for individual companies, each using all returns larger 
than 5 standard deviations. Extrapolation of the linear 
regression lines yield: 



2.84 ±0.12 (positive tail) 
2.73 ± 0.13 (negative tail) 



(8) 



B. Scaling of the distribution of returns for 
At <l day 

The next logical step would be to extend the previous 
procedure to time scales longer than 5 min. However, 
this approach leads to unreliable results, the reason be- 
ing that the estimate of the time averaged volatility — 
used to define the normalized returns of Eq. (||) — has 
estimation errors that increase with At. For the distri- 
bution of 5 min returns, the previous procedure relies on 
40,000 data points per company for the estimation of the 
time averaged volatility. For 500 min returns the number 
of data points available is reduced to 400 per company 
which leads to a much larger error in the estimate of 
Vi(At). 

To circumvent the difficulty arising from the large un- 
certainty in Vi(At), we use an alternative procedure for 
estimating the volatility [^8|,^9 31 which relies on two 
observations. The first is that volatility decreases with 
market capitalization [Fig. Q. The second is that com- 
panies with similar market capitalization typically have 
similar volatilities. Based on these observations, we make 
the hypothesis that the market capitalization is the most 
influential factor in determining the volatility, 



Vi = Vi(S, At) . 



(9) 



Hence, we group the returns of all the companies into 
"bins" according to the market capitalization of each 
company at the beginning of the interval for which the 
return is computed. We then compute the conditional 
probability of the At returns for each of the bins of mar- 
ket capitalization. We define Gs = Gs(t,At) as the At 
returns of the subset of all companies with market cap- 
italization S, and we then calculate the cumulative con- 
ditional probability P(Gs > x\S). Figure ||(a) shows 
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P{Gs > X \S) f° r 30 min returns for four different bins 
of S. The functional form for each of each of the four 
distributions is consistent with a power-law. 
We define a normalized return 



\9\ 



(12) 



g s = g s (t,At) 



Gs(At) - (Gg(At))< 
vs(At) 



(10) 



where (• ■ -)s denotes an average over all returns of all 
companies with market capitalization S. The average 
volatility v$ = vg(At) is defined through the relation, 



vs 2 = (G s 2 )s - (Gs)s' 



(11) 



We show in Fig. |^(b) the cumulative conditional prob- 
ability of the normalized 30 min returns P{gs > X \S) for 
the same four bins shown in Fig. ||(a). Visually, it seems 
clear that these distributions have power-law functional 
forms with similar values of a. Hence, to obtain bet- 
ter statistics, we consider the normalized returns for all 
values of S and compute a single cumulative distribution. 

Figure |^(a) shows the distribution of normalized 
30 min returns. We test if our alternative procedure of 
normalizing the returns by the time averaged volatility 
for each bin of market capitalization S is consistent with 
the previous procedure of normalizing by the time aver- 
aged volatility for each company through Eq. (||) . To this 
end, we also show in Fig. ^|(a) the distribution of normal- 
ized 30 min returns using the normalization of Eq. (j^). 
The distribution of returns obtained by both procedures 
are consistent with a power law decay of the same form 
as Eq. (^J). Power-law regression fits to the positive tail 
yield estimates of a = 3.21 ± 0.08 for the former method 
and a = 3.23 ± 0.05 for the latter, confirming the consis- 
tency of the two procedures. The values of the exponent 
for 30 min time scales, a = 3.21 ±0.08 (positive tail) and 
a = 3.01 ± 0.12 (negative tail), are also consistent with 
the estimates, Eq. (W), for 5 min normalized returns. 

Next, we compute the distribution of returns for longer 
time scales At. Figure ||(b) shows the cumulative dis- 
tribution of the normalized returns for time scales from 
5 min up to 1 day. We observe good "data collapse" with 
consistent values of a which suggests that the distribu- 
tion of returns appears to retain its functional form for 
larger At. The scaling of the distribution of returns for 
individual companies is consistent with previous results 
for the distribution of the S&P 500 index returns fl,M- 
The estimates of the exponent a from power-law regres- 
sion fits to the cumulative distribution and from the Hill 
estimator are listed in Table |. 



C. Scaling of the moments for At < 1 day 

In the preceding subsection we reported that the dis- 
tribution of returns retains the same functional form for 
5 min< At < 1 day. We can further test this scaling 
behavior by analyzing the moments of the distribution of 
normalized returns g, 



where (. . .) denotes an average over all the normalized 
returns for all the bins. Since a s=s 3, we expect /ifc to 
diverge for k > 3, and hence we compute fik for k < 3. 

Figure |](c) shows the moments of the normalized re- 
turns g for different time scales from 5 min up to 1 day. 
The moments do not vary significantly for the above time 
scales, thus confirming the scaling behavior of the distri- 
bution observed in Fig ^|(b) . 



IV. THE DISTRIBUTION OF RETURNS FOR 
1 DAY < AT < 16 DAYS 

For time scales of 1 day or longer, we analyze data from 
the CRSP database. We analyze approximately 3.5 x 10 7 
daily records for about 16,000 companies for the 35-year 
period 1962-96. We expect the market capitalization of 
a company to change dramatically in such a long period 
of time. Further, we expect small companies to be more 
volatile than large companies. Hence, large changes that 
might occur in the market capitalization of a company 
will lead to large changes on its average volatility. To con- 
trol for these changes in market capitalization, we adopt 
the method that was used in the previous subsection for 
At > 5 min. 

Thus, we compute the cumulative conditional proba- 
bility P(G S > x\S) that the return G s = G s (t,At) is 
greater than x, for a given bin of average market capital- 
ization S. We first divide the entire range of S into bins 
of uniform length in logarithmic scale. We then com- 
pute a separate probability distribution for the returns 
Gs which belong to a bin of average market capitaliza- 
tion S. 

Figure |?](a) shows the cumulative distribution of daily 
returns P(Gs > x\S) for different values of S. Since the 
widths of these distributions are different for different S, 
we analyze the normalized returns gs, which were defined 
in Eq. ph. 

Figure 0(b) shows the cumulative distribution P(gs > 
x) of the normalized daily returns gs ■ These distributions 
appear to have similar functional forms for different val- 
ues of S. In order to improve statistics, we compute a 
single cumulative distribution P(gs > x) of the normal- 
ized returns for all S. We observe a power-law behavior 
of the same form as Eq. (^). Regression fits yield esti- 
mates for the exponent, a = 2.96 ± 0.09 for the positive 
tail and a = 2.70 ± 0.10 for the negative tail. 

Figure |§|(a) compares the cumulative distributions of 
the normalized 1 day returns obtained from the CRSP 
and TAQ databases. The estimates of the power-law ex- 
ponents obtained from regression fits are in good agree- 
ment for these two databases. 

Figures ||(b,c) show the distributions of normalized re- 
turns for At = 1,4,16 days. The estimates of the ex- 
ponent a increase slightly in value for the positive tail, 
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while for the negative tail the estimates of a are approx- 
imately constant. The increase in a for the positive tail 
is also reflected in the moments [Fig. ||(d)]. 



V. THE DISTRIBUTION OF RETURNS FOR 
AT > 16 DAYS 

The scaling behavior of the distributions of returns ap- 
pears to break down for At > 16 days, and we observe 
indications of slow convergence to Gaussian behavior. In 
Figs. ||(a,b) we show the cumulative distributions of the 
normalized returns for At > 16 days. For the positive 
tail, we find indications of convergence to a Gaussian, 
while the negative tail appears not to converge. The con- 
vergence to Gaussian behavior is also apparent from the 
behavior of the moments for these time scales [Fig. ||(c)] . 

To summarize our results for the distribution of indi- 
vidual company returns, we find that (i) the distribution 
of normalized returns for individual companies is con- 
sistent with a power-law behavior characterized by an 
exponent a « 3, (ii) the distributions of returns retain 
the same functional form for a wide range of time scales 
At, varying over 3 orders of magnitude, 5 min< At < 
6240 min = 16 days, and (iii) for At > 16 days, the 
distribution of returns appears to slowly converge to a 
Gaussian [Fig. 0. 



VI. CROSS-CORRELATIONS 



behavior — even though the 500 distributions of individ- 
ual returns Gj (t, At) are not stable. Consider the family 
of index returns defined as the partial sum |35j] 

JV 

G (ff) (t,M) = ^2wiGi{t,At), (13) 

i=l 

where the weights Wi = Si/Ylj—i $j have weak time de- 
pendencies Pq| . From the central limit theorem for ran- 
dom variables with finite variance, we expect that the 
probability distribution of G/jy) would change systemat- 
ically with TV and approach a Gaussian for large TV, pro- 
vided there are no significant dependencies among the 
returns G; for different i. Instead, we find that the dis- 
tribution of G(j\n has the same asymptotic behavior as 
that for individual companies. 

In order to show that the scaling behavior may be due 
to cross-correlations between companies, we first destroy 
any existing dependencies among the returns of different 
companies by randomizing each of the 1000 time series 
Gi(t). By adding up the shuffled series, we construct 
a shuffled index return G?^(t) out of statistically in- 
dependent companies with the same distribution of re- 
turns. Fig. |ll](c) shows the cumulative distribution of 
the shuffled index returns G^ (t, At) for increasing TV 
and At = 5 min. The distribution changes with TV, and 
approaches a Gaussian shape for large TV, which indicates 
that the scaling in Fig. |ll](a) is caused by non-trivial de- 
pendencies between different companies. 



In this section we address the second question that we 
posed initially. That is, why is it that the distribution 
of returns for individual companies and for the S&P 500 
index have the same asymptotic form? In the previous 
sections, we presented evidence that the distribution of 
returns scales for a wide range of time intervals. In a 
previous study fill , we demonstrated that this scaling 
behavior is possibly due to time dependencies, in par- 
ticular, volatility correlations. Next, we will show that 
as the time correlations lead to the time scaling of the 
distributions of returns, so do cross correlations among 
different companies lead to a functional form of the dis- 
tribution of returns of indices similar to that for single 
companies. 

A direct way of analyzing the cross-correlations is by 
computing the cross-correlation matrix f32]-|3"4f| . Here, we 
take a different approach, by analyzing the distribution 
of returns as a function of market capitalization. 

First, we compare the distributions of the S&P 500 
index and that of individual companies. Figures |Ti"|(a,b) 
show the cumulative distribution P(g > x) for individual 
companies and for the S&P 500 index. The distributions 
show the same power-law behavior for 2 < g < 80. This 
is surprising, because the distribution of index returns 
Gsp5oo(t, At) does not show convergence to Gaussian 



VII. DISCUSSION 

We have presented a systematic analysis, on two differ- 
ent databases, of the distribution of returns for individual 
companies for time scales At ranging from 5 min up to 
« 4 years. We find that the distribution of returns is con- 
sistent with a power-law asymptotic behavior, character- 
ized by an exponent a w 3 — well outside the stable Levy 
regime < a < 2 — for time scales up to approximately 
16 days. For longer time scales, the scaling behavior ap- 
pears to break down and we observe "slow" convergence 
to Gaussian behavior. 

We also find that the distribution of returns of indi- 
vidual companies and the S&P 500 index have the same 
asymptotic behavior. This scaling behavior does not hold 
when the cross-correlations between companies are de- 
stroyed, suggesting the existence of correlations between 
companies — as occurs in strongly interacting physical 
systems where power-law correlations at the critical point 
result in scale-invariant properties. Recent studies of the 
cross-correlation matrix using methods of random ma- 
trix theory p2|-|3"4j also show the existence of correlations 
that are present through a wide range of time scales from 
30 mins Q up to 1 day These studies ||-|34|] 

show that the largest eigenvalue of the cross-correlation 
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matrix corresponds to correlations that pervade the en- 
tire market, and a few other large eigenvalues correspond 
to clusters of companies that are correlated amongst each 
other. 
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APPENDIX A: DEPENDENCE OF VOLATILITY 
ON SIZE 



We find that the average volatility for each bin, vs (At) 
shows an interesting dependence on the market capital- 
ization. In Fig. |4|, we plot the standard deviation as a 
function of size on a log-log scale for At — 1 day. We 
find a power-law dependence of the standard deviation of 
the returns on the market capitalization, with exponent 
P w 0.2 very similar to the values reported for the an- 
nual sales of firms @^|,||, the GDP of countries §|] 
and the university research budgets |]30]] . For larger time 
scales the exponent gradually decreases, approaching the 
value (3 w 0.09 for Ai= 1000 days. 
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The CRSP links all former and current company iden- 
tifiers to a unique permanent CRSP identifier allowing 
uninterrupted time-series analysis. 

The New York Stock Exchange is open from Monday 
through Friday 9:30 a.m. to 4:00 p.m. The time runs over 
the working hours only. Nights, week-ends, and holidays 
are removed. 

Only the companies that existed through out the 2-year 
period 1994-95 were considered. 

The trading frequency increases, on average, with mar- 
ket capitalization. For the largest companies there are 
several trades that occur within each 5 min interval. On 
the other hand, for the smallest companies we consider, 
the typical time between trades is of the order of 5 min. 
The analyzed data are affected by several types of record- 
ing errors. The most common error is missing digits which 
appears as a large spike in the time series of returns. 
These are much larger than usual fluctuations and can 
be removed by choosing an appropriate threshold. We 
tested a range of thresholds and find no effect on the 
results. Additionally we checked individually that the re- 
moved events correspond to missing digits in entering the 
data. There are also stock splits and take-overs which of- 
ten occur overnight. To account for these, we take to be 
zero all the returns that happen overnight that are merely 
due to change in the number of outstanding shares. 
The errors on the exponent estimates are the errors given 
by the regression fits to the cumulative distribution. 
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24960 1 " 
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99840 1 " 
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3.43 ± 0.04 
3.73 ±0.04 
3.98 ±0.09 
4.24 ±0.09 
5.06 ±0.07 
5.24 ±0.12 
6.43 ± 0.29 



2.74 ±0.12 
2.63 ± 0.06 
2.78 ±0.07 
2.84 ±0.07 
3.01 ±0.07 
3.32 ± 0.06 
3.48 ± 0.07 



3.35 ±0.04 
3.54 ±0.05 
3.89 ±0.09 
4.52 ±0.22 

4.5 ±0.6 

5.6 ± 1.0 
5.11 ±0.03 



2.93 ± 0.07 
2.93 ±0.08 
3.00 ±0.10 
3.10 ±0.18 
2.92 ±0.19 
3.14 ±0.13 
3.45 ±0.02 



TABLE I. The values of the exponent a for different time 
scales At obtained by (a) power-law regression fit to the cumu- 
lative distribution , and (b) Hill estimator. The non-daggered 
values are computed using the TAQ database, which contains 
tic-data, while the daggered values are computed using the 
CRSP database, which contains records with At = 1 day and 
At = 1 month sampling. Note that we use the conversion 
1 day = 390 min and 1 month = 22 days. 



At (min) Power law fit Hill estimator 

Positive Negative Positive Negative 

5 3.10 ± 0.03 2.84 ±0.12 2.84 ±0.12 2.73 ±0.13 

10 3.32 ±0.08 2.89 ±0.13 3.14 ± 0.10 2.68 ±0.14 

20 3.25 ±0.08 2.75 ±0.10 3.32 ±0.18 2.41 ±0.10 

40 3.28 ±0.08 2.61 ±0.10 3.39 ± 0.16 2.62 ±0.11 

80 3.50 ±0.13 2.49 ±0.11 3.65 ± 0.26 2.53 ±0.14 

160 3.47 ±0.08 2.42 ±0.09 2.9 ± 0.4 2.53 ±0.17 

320 3.60 ±0.10 2.54 ±0.10 3.32 ± 0.08 3.19 ±0.05 

390+ 2.96 ±0.09 2.70 ±0.10 3.05 ± 0.13 2.95 ±0.15 

780+ 3.09 ±0.03 2.62 ± 0.04 3.11 ± 0.09 2.90 ±0.12 

1560+ 3.18 ±0.05 2.75 ±0.09 3.20 ± 0.08 2.90 ±0.10 

3120+ 3.31 ±0.08 2.71 ±0.03 3.25 ± 0.06 2.94 ± 0.09 
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Normalized price returns 

FIG. 1. (a) Cumulative distributions P(g > x) for the 
positive tails of 10 randomly-selected companies. Note that 
they are all consistent with a power-law asymptotic behav- 
ior, (b) The histogram of the power- law exponents obtained 
by power-law regression fits to the individual cumulative dis- 
tribution functions, where the fit is for all x larger than 2 
standard deviations. Note that this histogram is not normal- 
ized — the y-axis indicates the number of occurrences of the 
exponent, (c) Cumulative distributions of the 10 randomly 
chosen companies in (a) scaled by the standard deviation cal- 
culated from the entire 2-year period. 
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FIG. 2. (a) Cumulative distributions of the positive 
and negative tails of the normalized returns of the 1000 
largest companies in the TAQ database for the 2-year pe- 
riod 1994-1995. The solid line is a power-law regression fit in 
the region 2 < x < 80. (b) Probability density function of the 
normalized returns. The values in the center of the distribu- 
tion arise from the discreteness in stock prices, which are set 
in units of fractions of USD, usually 1/8, 1/16, or 1/32. The 
solid curve is a power-law fit in the region 2 < x < 80. We 
find a = 3.10 ± 0.03 for the positive tail and a = 2.84 ± 0.12 
for the negative tail. 
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FIG. 3. The in- 

verse local slope of P(g), C _1 (s) = — (dlog P(g)/dlog g) as 
a function of the inverse normalized returns 1/g for (a) the 
negative tail and (b) the positive tail J^j,^. Each data point 
shown is an average over 1000 events and the lines are linear 
regression fits to the data. The linear regression fit over the 
range < g < 0.2 yields the values of the inverse asymptotic 
slopes, 1/a; we find, a = 2.84 ± 0.12 for the positive and 
a = 2.73 ± 0.13 for the negative tail. Note that the average 
over all events used would be identical to the estimator for 
the asymptotic slope proposed by Hill p7| . 




Market capiiuIi:tUHin 

FIG. 4. Log-log plot of the standard deviation of the dis- 
tribution of returns as a function of market capitalization for 
At = 1 day. Our preliminary data suggest a power-law de- 
pendence with exponent /3 ~ 0.2. This value is not unlike 
what was observed for the firm sales (f3 ~ 1/6) GDP of 
countries (J3 « 1/6) Q, and research budgets (J3 » 1/4) Q. 
For large values of market capitalization, this power-law is 
followed by a "flat" region. 
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FIG. 5. (a) Cumulative distribution of the condi- 
tional probability P(g > x\S) of the 30 min returns, for 
companies with market capitalization 5*, from the TAQ 
database. We define uniformly spaced bins on a logarith- 
mic scale. We show the distribution of returns for the 4 bins, 

io 9 - 8 < s < io 10 2 , io 10 - 2 < s < io 10 - 4 , io 10 - 4 < s < io 10 - 6 , 

and 10 10 6 < S < IO 10 8 . (b) Cumulative conditional distribu- 
tions of returns normalized by the average volatility ws(At) 
of each bin. Note that we find the same functional form for 
the different values of S. 



FIG. 7. (a) Cumulative distribution of the conditional 
probability P(g > x\S) of the returns for companies with 
starting values of market capitalization S for At = 1 day from 
the CRSP database. We define uniformly spaced bins on a 
logarithmic scale and show the distribution of returns for the 
bins, 10 5 < S < 10 6 , 10 6 < S < 10 7 , 10 7 < S < 10 s , and 
10 8 < S < 10 9 . (b) Cumulative conditional distributions of 
returns normalized by the average volatility vs(At) of each 
bin. 
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FIG. 6. (a) Cumulative distribution of normalized returns 
for At = 30 min. The filled squares show the distribution for 
returns normalized by the time-averaged volatility for each 
company, as defined in Eq. (^J). The circles show the dis- 
tribution for returns normalized by the average volatility for 
each size bin, Eq. ([[(^), showing the consistency of these two 
methods, (b) The distribution of returns for different time 
scales At < 1 day. The exponents from the power-law regres- 
sion fits are summarized in Table [jj. (c) Fractional moments 
from < k < 3 for the normalized returns for the same 
scales as in (b). Note that the moments are not converging to 
Gaussian behavior, for example, at large k the moments for 
At = 80 min is to the right of At = 320 min. The thick full 
line shows the Gaussian moments. 
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Normalized daily returns k 

FIG. 8. (a) Cumulative distribution of normalized daily 
returns computed from the CRSP database contrasted with 
the same distribution from the TAQ database, normalized 
by the average volatility. Regression fits yield estimates 
a = 2.96 ± 0.09 (positive tail), and a = 2.70 ± 0.10 (nega- 
tive tail) for the CRSP data, and a — 3.27 ± 0.19 (positive 
tail) and a = 2.98±0.21 (negative tail) for the TAQ data. The 
regression fits were performed for the region 2 < g < 80. (b) 
Positive and (c) negative tails of the cumulative distribution 
of normalized returns for At = 1,4 and 16 days. Estimates 
of the exponents are listed in Table |. (d) The fractional mo- 
ments /ife = (\g\ ) for the normalized returns for the same 
time scales. The thick full line shows the Gaussian moments. 
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FIG. 9. (a) Positive and (b) negative tails of the cumula- 
tive distribution of the normalized returns for At = 16, 64, 256 
and 1024 days. The positive tail shows clear indication of 
convergence to Gaussian behavior, whereas for the negative 
tail the power-law behavior still seems to hold, although the 
statistics at the tail are limited for the longer time scales. 
Estimates of the exponents are listed in Table (c) The 
fractional moments fik, < k < 3 of the normalized returns 
for At = 16, 64, 256 and 1024 days show clear indication of 
convergence to Gaussian behavior with increasing At. 
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FIG. 10. The values of the exponent a characterizing 
the asymptotic power-law behavior of the distribution of re- 
turns as a function of the time scale At obtained using (a) 
a power- law fit, and (b) the Hill estimator. The values of a 
for At <1 day are calculated from the TAQ database while 
for At >1 day they are calculated from the CRSP database. 
The unshaded region, corresponding to time scales larger than 
(At) x ~ 16 days (6240 min), indicates the range of time 
scales where we find results consistent with slow convergence 
to Gaussian behavior. 




Normalized shuffled returns 

FIG. 11. (a) Positive and (b) negative tails of the cumu- 
lative distribution for the normalized returns for the indi- 
vidual companies and the S&P 500 index. Both the distri- 
butions show the same functional form, in spite of being a 
non-stable law. (c) Cumulative distribution for the shuffled 
returns </ (A °(t, At) for N = 1, 10, 100, 500. The dotted curve 
is the cumulative distribution for the S&P 500. With increas- 
ing N the curves progressively approach a Gaussian, imply- 
ing that without the cross-dependencies between companies, 
the cumulative distribution for the S&P 500 would be almost 
Gaussian. 
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